Comments for MEDB 5501, Week 13

Covariance

  • \(Cov(X,Y)=\frac{1}{n-1}\Sigma(X_i-\bar{X})(Y_i-\bar{Y})\)
    • \((X_i-\bar{X})(Y_i-\bar{Y})\) is positive if
      • \(X_i\) and \(Y_i\) both above average
      • \(X_i\) and \(Y_i\) both below average
    • \((X_i-\bar{X})(Y_i-\bar{Y})\) is negative if
      • \(X_i\) above average and \(Y_i\) below average
      • \(X_i\) below average and \(Y_i\) above average

  x  y
 11 13
 15  9
 19 15
 21  7
 25 11
 29  5

\(\ \)

\(\bar{X}=20\);

\(\bar{Y}=10\);

\(S_X=6.5\);

\(S_Y=3.7\)

Calculation of covariance

 x_centered y_centered product
          9         -5     -45
          1         -3      -3
         -5         -1       5
          5          1       5
         -9          3     -27
         -1          5      -5

\(\ \)

  • \(Cov(X,Y)=\frac{1}{5}(-70)=-14\)

Correlation

  • \(Corr(X,Y)=\frac{Cov(X,Y)}{S_XS_Y}\)
    • Also use \(r_{XY}\)
    • Population correlation is \(\rho_{XY}\)

Calculation of correlation

  • \(r_{XY}=\frac{-14}{6.5 \times 3.7}=-0.571929\)
    • Always round!
      • \(r_{XY}=-0.57\) or \(-0.6\)

Interpretation of correlation

  • r is always between -1 and +1
    • Positive values imply positive association
    • Negative values imply negative association
    • Strongest associations closest to -1 or +1

r between -1 and -0.7, strong negative association

r between -0.7 and -0.3, weak negative association

r between -0.3 and +0.3, little or no association

r between +0.3 and +0.7, weak positive association

r between +0.7 and +1, strong positive association

Extreme case, perfect association

Sleep data dictionary, 1 of 6

---
data_dictionary:
  sleep.txt
  
source:
  This dataset is part of the Austrasian Data and
  Story Library (OZDASL). Please cite this data as
  Smyth, GK (2011). Australasian Data and Story 
  Library (OzDASL). http://www.statsci.org/data.
  The data comes originally from Allison, T., and
  Cicchetti, D. V. (1976). Sleep in mammals. 
  ecological and constitutional correlates. 
  Science 194 (November 12), 732-734.

Sleep data dictionary, 2 of 6

description:
  This dataset has information about sleep patterns
  in 62 common mammals, along with other information
  that might help you understand what influences
  variations in sleep.
  
download:
  text-format: http://www.statsci.org/data/general/sleep.txt
  additional-information: http://www.statsci.org/data/general/sleep.html

copyright:
  There is no information about the copyright for this
  dataset. You should, however, be able to use this
  data for individual educational purposes under the
  Fair Use guidelines of U.S. copyright law.

Sleep data dictionary, 3 of 6

format: 
  delimiter: tab
  varnames: included in the first row of data
  missing-value-code: NA
  rows: 62
  columns: 11

Sleep data dictionary, 4 of 6

vars:
  Species:
    label: Species of mammal
    
  BodyWt:
    label: Body weight
    unit: kg
    
  BrainWt:
    label: Brain weight
    unit: g
    

Sleep data dictionary, 5 of 6

  NonDreaming:
    label: Time spent in non-dreaming sleep
    unit: hours
    
  Dreaming:
    label: Time spent in dreaming sleep
    unit: hours
    
  TotalSleep:
    label: Total time spent in sleep
    unit: hours
    
  LifeSpan:
    unit: years
    

Sleep data dictionary, 6 of 6

  Gestation:
    unit: days

  Predation:
    scale: likert
    range: 1-5

  Exposure:
    scale: likert
    range: 1-5
    
  Danger:
    scale: likert
    range: 1-5
---

What does a missing value represent

  • Dropout
  • Refuse to answer survey question
  • Survey question is not applicable
  • Lab result is lost
  • Concentration below detectable limit
  • Many other reasons

Common missing value codes

  • A single dot (.)
    • SPSS and SAS
  • NA
    • R
  • Asterisk (*) and other symbols
  • Unusual number codes (-1, 9, 99, 999)

Importing missing values

  • No problems for default value
  • NA and * convert numeric to string
    • Fix during import, or
    • Convert back after import
  • Unusual number codes
    • Designate after import
    • Don’t forget!

Imputing missing values, 1 of 2

  • Several simple (simplistic?) imputation choices
    • No news is bad news
    • No news is good news
    • No news is average news (MCAR)
    • No news is last week’s news (LOCF)

Imputing missing values, 2 of 2

  • Rigorous approaches (beyond the scope of this class)
    • Missing at random (MAR), Missing not at random (MNAR)
    • Ignorable, Non-ignorable
    • Single/Multiple imputation
    • Maximum likelihood/Bayesian approaches
  • You cannot ignore missingness, you cannot avoid imputation

SPSS investigation of missing data, 1 of 2

SPSS investiation of missing data, 2 of 2

Missing value approaches for correlations, 1 of 2

\[\begin{matrix} A_1 & B_1 & C_1\\ A_2 & B_2 & C_2\\ A_3 & B_3 & C_3\\ A_4 & B_4 & C_4\\ A_5 & B_5 & C_5\\ A_6 & B_6 & .\\ \end{matrix}\]

Missing value approaches for correlations, 2 of 2

  • Complete case analysis,
    • Use 9 pairs for \(r_{AB}\), \(r_{AC}\), and \(r_{BC}\)
  • Pairwise deletion
    • Use 9 pairs for \(r_{AC}\), and \(r_{BC}\)
    • Use 10 pairs for \(r_{AB}\)

SPSS correlation analysis, 1 of 6

SPSS analysis, 4 of 9

SPSS analysis, 5 of 9

SPSS analysis, 6 of 9

SPSS analysis, 7 of 9

SPSS analysis, 8 of 9

SPSS analysis, 9 of 9

SPSS analysis, 10 of 9

SPSS analysis, 11 of 9

Partial correlation

  • \(\rho_{XY\cdot Z}=\frac{\rho_{XY}-\rho_{XZ}\rho_{ZY}} {\sqrt{1-\rho_{XZ}^2}\sqrt{1-\rho_{ZY}^2}}\)